<h1 id="references">References</h1>

<p>Agarwal, A., Bird, S., Cozowicz, M., Hoang, L., Langford, J., Lee, S., Li, J., Melamed, D., Oshri, G., Ribas, O., Sen, S., &amp; Slivkins, A. (2016). A Multiworld Testing Decision Service. <i>CoRR</i>, <i>abs/1606.03966</i>. https://arxiv.org/abs/1606.03966</p>

<p>Li, L., Chu, W., Langford, J., &amp; Schapire, R. E. (2010). A Contextual-Bandit Approach to Personalized News Article Recommendation. <i>CoRR</i>, <i>abs/1003.0146</i>. https://arxiv.org/abs/1003.0146</p>

<p>Horvitz, D. G., &amp; Thompson, D. J. (1952). A Generalization of Sampling Without Replacement from a Finite Universe. <i>Journal of the American Statistical Association</i>, <i>47</i>(260), 663–685. https://doi.org/10.1080/01621459.1952.10483446</p>

<p>Jiang, N., &amp; Li, L. (2016). Doubly Robust Off-policy Value Evaluation for Reinforcement Learning. <i>Proceedings of the 33nd International Conference on Machine Learning,
 ICML 2016, New York City, NY, USA, June 19-24, 2016</i>, 652–661. https://proceedings.mlr.press/v48/jiang16.html</p>

<p>Dudı́k Miroslav, Langford, J., &amp; Li, L. (2011). Doubly Robust Policy Evaluation and Learning. <i>Proceedings of the 28th International Conference on Machine Learning,
 ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011</i>, 1097–1104. https://icml.cc/2011/papers/554_icmlpaper.pdf</p>

<p>Bietti, A., Agarwal, A., &amp; Langford, J. (2018). <i>A Contextual Bandit Bake-off</i>. arXiv:1802.04064v3 [stat.ML]. https://www.microsoft.com/en-us/research/publication/a-contextual-bandit-bake-off-2/</p>

<p>Karampatziakis, N., &amp; Langford, J. (2011). Online Importance Weight Aware Updates. <i>Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence</i>, 392–399. https://dl.acm.org/citation.cfm?id=3020548.3020594</p>

<p>Osband, I., &amp; Roy, B. V. (2015). Bootstrapped Thompson Sampling and Deep Exploration. <i>CoRR</i>, <i>abs/1507.00300</i>. https://arxiv.org/abs/1507.00300</p>

<p>Eckles, D., &amp; Kaptein, M. (2014). Thompson sampling with the online bootstrap. <i>CoRR</i>, <i>abs/1410.4009</i>. https://arxiv.org/abs/1410.4009</p>

<p>Agarwal, A., Hsu, D. J., Kale, S., Langford, J., Li, L., &amp; Schapire, R. E. (2014). Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. <i>CoRR</i>, <i>abs/1402.0555</i>. https://arxiv.org/abs/1402.0555</p>

<p>Cortes, D. (2018). Adapting multi-armed bandits policies to contextual bandits scenarios. <i>CoRR</i>, <i>abs/1811.04383</i>. https://arxiv.org/abs/1811.04383</p>

<p>Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., &amp; Vishwanathan, S. V. N. (2009). Hash Kernels for Structured Data. <i>J. Mach. Learn. Res.</i>, <i>10</i>, 2615–2637. https://dl.acm.org/citation.cfm?id=1577069.1755873</p>

<p>Weinberger, K. Q., Dasgupta, A., Attenberg, J., Langford, J., &amp; Smola, A. J. (2009). Feature Hashing for Large Scale Multitask Learning. <i>CoRR</i>, <i>abs/0902.2206</i>. https://arxiv.org/abs/0902.2206</p>

<p>Agarwal, A., Chapelle, O., Dudı́k Miroslav, &amp; Langford, J. (2011). A Reliable Effective Terascale Linear Learning System. <i>CoRR</i>, <i>abs/1110.4198</i>. https://arxiv.org/abs/1110.4198</p>

<p>Swaminathan, A., Krishnamurthy, A., Agarwal, A., Dudı́k Miroslav, Langford, J., Jose, D., &amp; Zitouni, I. (2016). Off-policy evaluation for slate recommendation. <i>CoRR</i>, <i>abs/1605.04812</i>. https://arxiv.org/abs/1605.04812</p>
